-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill bad time intervals with fake data #782
Conversation
Hello @matteobachetti! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-01-11 09:46:05 UTC |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #782 +/- ##
==========================================
+ Coverage 96.31% 96.33% +0.02%
==========================================
Files 43 43
Lines 8497 8548 +51
==========================================
+ Hits 8184 8235 +51
Misses 313 313 ☔ View full report in Codecov by Sentry. |
9357c6a
to
b14a586
Compare
ac336a8
to
7ad8a61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @matteobachetti,
this is a great additional tool for the software.
A few things:
In a lightcurve-like object, the filling-the-gap routine is easier to understand. We know the time resolution of the lightcurve and the routine creates time bins with a value of count / countrate.
In a StingrayTimeseries-like object, it is more complicated, how do you choose the arrival times that are used to fill the gap? From the docstring I read
"Random data are extracted by randomly repeating the values of nearby good data"
This is not extremely clear. If you repeat the values of the arrival time, you don't fill the empty gaps, do you?
More importantly, we should find a way to test that using the function fill_bad_time_intervals on a StingrayTimeseries and then making a lightcurve is equivalent to using the function fill_bad_time_intervals on the lightcurve made from the original StingrayTimeseries (with the gaps). I guess the two cases can't lead to the same numbers, because of the random function, but they should be close enough.
In a lightcurve-like object
- what happens if the gaps are smaller than the dt?
- can I fill only certain gaps and not others? For example, I want to fill the gaps in the first half of the light curve and not in the second half
- can we include the possibility of filling the gap with the linear interpolation between the two edges of GTIs adjacent to the BTI? Does it make sense?
Additional tests:
- test multiple gaps and not only one gap to fill
- some specific comments in the code
ev_like_filt.gti = np.asarray([[0, 498], [500, 900], [950, 1000]]) | ||
ev_new = ev_like_filt.fill_bad_time_intervals() | ||
|
||
assert np.allclose(ev_new.gti, self.gti) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, what happens to the time array?
In principle, ev_new.time != ev_like_filt.time, because the BTI has been filled with random arrival times. right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
7bf79e7
to
11cf78e
Compare
@mgullik thanks for the thorough review, which I tried to cover in my new changes. There are a few remaining questions which I haven't answered yet, that probably need some discussion:
I improved the docstring, explaining how uniformly and non-uniformly sampled data are treated differently. Basically, the only change is that times are assigned on a fixed grid for uniformly sampled, and randomized with the same countrate as in the buffer for non-uniformly sampled.
Why do you think this is important? A uniformly sampled time series should behave just like a light curve, and tests in both time series and light curves all pass independently.
The light curve machinery should cover this: bins partially outside GTIs are just treated as if they were outside GTIs.
I don't see a use case for this, what would be the application? I guess one could split the light curve, fill the GTIs in the first half, and then join the two chunks back together.
We could. Again, what would be the use case? While using random data tries to preserve the statistical properties of the data set, the linear interpolation would knowingly alter that.
Done
I think I addressed those one by one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! some comments and thoughts on the limits of what I can do, that I think should probably be written down somewhere, but not necessarily here (maybe in a tutorial?)
stingray/base.py
Outdated
---------------- | ||
max_length : float | ||
Maximum length of a bad time interval to be filled. If None, the criterion is bad | ||
time intervals shorter than 1/100th of the longest bad time interval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does 1/100 come from? That seems maybe a bit arbitrary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually 1% of the longest good time interval. It's just a small length, by default, so that we don't alter the statistical properties of the data too much
88468da
to
8248a04
Compare
8248a04
to
de2ee3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍 Assuming @mgullik is happy with it, too, this can get merged, I think?
Depends on #754Changes can be seen in action in StingraySoftware/notebooks#76
Also, resolve #612